---
title: "Digital Inequality Manuscript Supporting Code"
author: "Ashley Cate, Steven Moen, and Kaipeng Chen"
output: pdf_document
---

This code contains several sections, which can be used to recreate the key items in the paper. Please do the following in order to run the code:

- You may need to run the line below "# NOTE: You may need to install these packages". To run these lines, you'll first need to uncomment lines 32 through 34 by deleting the pound symbol in front of them.
- You will need to point the read.csv sections to wherever the required data is saved. These are noted with "# NOTE: You may need to update this file location". You'll also need to update the output file path to save some of the plots, indicated by the same note.

This code is sufficient to create everything in the manuscript. Below is a summary of each section.

0. Setup reads in libraries and performs data checks.
1. Visualizations produces many graphs of the data, some of which are in the manuscript.
2. Granger Causality Analysis shows the analysis where we test the effect of funding rate in predicting subscription rates and vice-versa
3. Drivers of Subscription Rates shows what influences the subscription rates in a large model
4. Assumption Checking looks into the effects of spatial correlation and other general model assumptions
5. Supplemental Tables contains summary tables that are included in the online supplemental material

On a 2020 M1 MacBook pro with 16 GB of ram, the code below ran successfully in under 20 seconds using R 4.4.1 and RStudio version 2024.04.2+764. All packages were updated to their most recent version on September 27th, 2024. You should see "The code ran successfully!" in the console if there are no issues.

# 0. Setup

```{r}
# Record the system time
tic <- Sys.time()
```

```{r}
# NOTE: You may need to install these packages
# install.packages(c("dplyr", "MASS", "emdi", "ggplot2", "magrittr", "kableExtra",
#                   "car", "lmtest", "geoR", "olsrr", "betareg", "AER", "plm",
#                  "USpopcenters", "scales", "tibble"))

# Load in libraries
library(dplyr)
library(MASS)
library(emdi)
library(ggplot2)
library(magrittr)
library(kableExtra)
library(car)
library(lmtest)
library(geoR)
library(olsrr)
library(betareg)
library(AER)
library(plm)
library(USpopcenters)
library(scales)
library(tibble)
```

```{r}
# NOTE: You may need to update this file location

# Output location
output_plot_path = "~/Documents/AC\ Datafiles/Histograms_07112024"
```

```{r}
# NOTE: You may need to update this file location

# Read in the data
plm_data <- read.csv("DigIneq_Manuscript_Data_PanelFormat_v5.csv", na.strings=c("","NA")) 
```

## Data Checks

```{r}
# Replace missing data with zeros
plm_data[is.na(plm_data)] <- 0 

# Check for complete cases
sum(!complete.cases(plm_data))

# Funding per capita
plm_data$Funding_per_Capita <- plm_data$Funding/plm_data$Population
```

```{r data review}
head(plm_data)
```

```{r}
# Append a factor year column
plm_data$Year_fact <- as.factor(plm_data$Year)

# Convert county to factor
plm_data$County <- as.factor(plm_data$County)

# Convert HHI to a numeric variable
plm_data$HHI_num <- as.numeric(gsub(",","",plm_data$HHI))

# Cut out 2022
plm_data_small <- subset(plm_data, Year < 2022)
```

```{r}
# Sort the data
plm_data_sort <- plm_data_small %>% arrange(County)
summary(plm_data_sort)
```

# 1. Visualizations

Some of the figures are not in order - this is intentional. There are code manipulations that need to happen for some plots. But the plots are all here.

```{r}
# Feed into a model
mod_init <- lm(SubscriptionRate_Any ~ County + County*I(Year-2017), data = plm_data_sort)
summary(mod_init)
plot(mod_init)
```

```{r}
# Include a predict argument
plm_data_sort$Predicted_Subscription_Rate <- predict.lm(mod_init, plm_data_sort)

# Write a grouping function to take the average per-capita funding and RUCC over the years
plm_data_agg <- plm_data_sort %>% group_by(County) %>% 
  summarize(Mean_Funding_per_Capita = mean(Funding_per_Capita),
            Mean_RUCC = mean(RUCC_2013))

# Left join
plm_data_sort <- merge(x = plm_data_sort, y = plm_data_agg,
      by = "County", all.x = TRUE)
```

## Per capita funding plot

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = Mean_Funding_per_Capita)) + 
  geom_line() + 
  scale_color_gradient(low="red", high="green") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "Avg. Per Capita Funding (2017-21)",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## Figure 2: Linear RUCC plot

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = Mean_RUCC)) + 
  geom_line() + 
  # scale_color_gradient(low="red", high="green") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "RUCC",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## HHI

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = HHI_num)) + 
  geom_line() + 
  scale_color_gradient(low="black", high="green") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "Household Income",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## HHI over time plot (alt)

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = HHI_num)) + 
  geom_line() + 
  # scale_color_gradient(low="red", high="green") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "Household Income",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## Population Plot

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = Population)) + 
  geom_line() + 
  scale_color_gradient(low="grey", high="orange") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "Population",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## Figure 3: Per Capita Income Plot

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = Per.Capita.Income)) + 
  geom_line() + 
  scale_color_gradient(low="black", high="green") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "Per Capita Income",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## Figure 4: Household Income Plot

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = HHI_num)) + 
  geom_line() + 
  scale_color_gradient(low="black", high="green") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "Household Income",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## Figure 5: Median Age Plot

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = Age)) + 
  geom_line() + 
  scale_color_gradient(low="black", high="yellow") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "Median Age",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## Figure 6: High School Dropout Plot

```{r}
plm_data_sort %>% ggplot(aes(x = Year, y = Predicted_Subscription_Rate, group = County, color = Education_LessThanHighSchoolGraduate)) + 
  geom_line() + 
  scale_color_gradient(low="black", high="red") +
  labs(title = "Subscription Rates over Time",
       y = "Modeled Subscription Rates",
       color = "% High School Dropout",
       caption = "Each county modeled using a simple linear regression with different intercepts and slopes")
```

## Funding over Time Model

```{r}
# Feed into a model
mod_funding_over_time <- lm(Funding_per_Capita ~ County + County*I(Year-2017), data = plm_data_sort)
summary(mod_funding_over_time)
plot(mod_funding_over_time)
```

```{r}
# Include a predict argument
plm_data_sort$Predicted_Funding_per_Capita <- predict.lm(mod_funding_over_time, plm_data_sort)

# Write a grouping function to take the average per-capita funding and RUCC over the years
plm_data_agg_2 <- plm_data_sort %>% group_by(County) %>% 
  summarize(Mean_SubscriptionRate_Any = mean(SubscriptionRate_Any))

# Left join
plm_data_sort_2 <- merge(x = plm_data_sort, y = plm_data_agg_2,
      by = "County", all.x = TRUE)

# Join in average by year
plm_data_agg_3 <- plm_data_sort %>% group_by(Year) %>% 
  summarize(Mean_SubscriptionRate_Any_Year = mean(SubscriptionRate_Any),
            SD_SubscriptionRate_Any_Year = sd(SubscriptionRate_Any),
            Mean_RUCC_Year = mean(RUCC_2013))

# Join this again, and compute Subscription Rate delta from average
plm_data_sort_2 <- merge(x = plm_data_sort_2, y = plm_data_agg_3,
      by = "Year", all.x = TRUE)

# Produce a delta subscription value
plm_data_sort_2$Mean_SubscriptionRate_Delta = plm_data_sort_2$Mean_SubscriptionRate_Any - plm_data_sort_2$Mean_SubscriptionRate_Any_Year

# Produce a delta subscription z-score
plm_data_sort_2$Mean_SubscriptionRate_Z_Score = (plm_data_sort_2$Mean_SubscriptionRate_Any - plm_data_sort_2$Mean_SubscriptionRate_Any_Year)/plm_data_sort_2$SD_SubscriptionRate_Any_Year
```

## Per capita funding plot - non square root scale

```{r}
plm_data_sort_2 %>% ggplot(aes(x = Year, y = Funding_per_Capita, group = County, color = SubscriptionRate_Any)) + 
  geom_jitter(width = 0.15) + 
  scale_color_gradient(low="red", high="green") +
  # scale_y_sqrt() + 
  labs(title = "Funding Per Capita over Time",
       y = "Square Root of Funding Per Capita",
       color = "Subscription Rate (2017-21)")
```

## Per capita funding plot

```{r}
plm_data_sort_2 %>% ggplot(aes(x = Year, y = Funding_per_Capita, group = County, color = SubscriptionRate_Any)) + 
  geom_jitter(width = 0.15) + 
  scale_color_gradient(low="red", high="green") +
  scale_y_sqrt() + 
  labs(title = "Funding Per Capita over Time",
       y = "Funding Per Capita",
       color = "Subscription Rate (2017-21)")
```

```{r}
# In this plot, we include the difference relative to the average in Wisconsin as the coloring factor. This accounts for the fact that the subscription rate increases over time
plm_data_sort_2 %>% ggplot(aes(x = Year, y = Funding_per_Capita, group = County, color = Mean_SubscriptionRate_Delta)) + 
  geom_jitter(width = 0.15) + 
  scale_color_gradient(low="red", high="green") +
  scale_y_sqrt() + 
  labs(title = "Funding Per Capita over Time",
       y = "Funding Per Capita",
       color = "Sub. Rate rel. to Mean by Year (2017-21)")
```

```{r}
# In this plot, we include the difference relative to the average in Wisconsin (normalized for spread) as the coloring factor. This accounts for the fact that the subscription rate increases over time
plm_data_sort_2 %>% ggplot(aes(x = Year, y = Funding_per_Capita, group = County, color = Mean_SubscriptionRate_Z_Score)) + 
  geom_jitter(width = 0.15) + 
  scale_color_gradient(low="red", high="green") +
  scale_y_sqrt() + 
  labs(title = "Funding Per Capita over Time",
       y = "Square Root of Funding Per Capita",
       color = "Sub. Rate Z-Score (2017-21)")
```

## Figure 1: RUCC Plot

```{r}
plm_data_sort_2 %>% ggplot(aes(x = Year, y = SubscriptionRate_Any, color = RUCC_2013)) + 
  geom_jitter(width = 0.15) + 
  # geom_smooth(method = "lm", se = F, lty = 2) + 
  # scale_color_gradient(low="red", high="green") +
  # scale_y_sqrt() + 
  labs(title = "Subscription Rate over Time",
       y = "Subscription Rate",
       color = "RUCC")
```

## RUCC plot

```{r}
plm_data_sort_2 %>% ggplot(aes(x = Year, y = Funding_per_Capita, group = County, color = Mean_RUCC)) + 
  geom_jitter(width = 0.15) + 
  # scale_color_gradient(low="red", high="green") +
  scale_y_sqrt() + 
  labs(title = "Funding Per Capita over Time",
       y = "Square Root of Funding Per Capita",
       color = "Mean RUCC")
```

## Supplemental Data Description Analysis 

```{r}
# A function to compute the number of bins
bins_fd <- function(vec) {
  # Feed into function
  diff(range(vec)) / (2 * IQR(vec) / length(vec)^(1/3))
}
```

```{r}
#' A function to plot a histogram and compute summary statistics
#'
#' @param x_var - the x variable to look at
#' @param df_in - the input data frame
#' @param x_lab - the label for the x-variable
#' @param filepath - the output filepath to save the histogram
#' @param comma_x - add commas to the x-axis
#' @param dollar_x - make the x-axis use dollar formatting
#' @param percent_x - use percentages on the x-axis
#'
#' @return - prints a table and a graph
#' @export - exports a table to the specified filepath
#'
#' @examples - plot_hist(x_var = Population, x_lab = "Population", comma_x = T)
plot_hist <- function(x_var, df_in = plm_data_sort, x_lab, filepath = output_plot_path,
                      comma_x = F,
                      dollar_x = F,
                      percent_x = F) {
  # Calculate binwidth using Freedman-Diaconis rule
  input <- df_in %>% pull({{x_var}})
  # Compute binwidth
  bw <- 2*IQR(input)/length(input)^(1/3)
  # Create a histogram
  hist <- df_in %>% ggplot(aes(x = {{x_var}})) + geom_histogram(binwidth = bw) +
    ylab("Count") +
    # Fix the axis
    {if(comma_x)scale_x_continuous(labels = scales::comma)}+
    {if(dollar_x)scale_x_continuous(labels = scales::dollar_format())}+
    {if(percent_x)scale_x_continuous(labels = function(x) paste0(x, "%"))}+
  theme(axis.title.x=element_blank(), plot.margin = margin(20,20,20,20))
  print(hist)
  # Compute mean, median, SD
  df_ss <- df_in %>% dplyr::summarize(Mean = mean({{x_var}}),
                                      Median = median({{x_var}}),
                                      `Standard Deviation` = sd({{x_var}}),
                                      Min = min({{x_var}}),
                                      Max = max({{x_var}})) %>% 
    as.data.frame() %>% 
    # Round the numbers 
     dplyr::mutate_if(is.numeric, round, 2) %>% 
  prettyNum(big.mark = ",", scientific = FALSE)
  print(df_ss)
  # Save
  ggsave(filename=paste0(x_lab, "_hist.jpg"), plot=hist, path = filepath, dpi = 600)
}
```

## Table S5 and Supporting Histograms

```{r}
# Run the function
plot_hist(x_var = Population, x_lab = "Population", comma_x = T)
plot_hist(x_var = Grants, x_lab = "Grants", comma_x = T)
plot_hist(x_var = Funding, x_lab = "Funding", dollar_x = T)
plot_hist(x_var = SubscriptionRate_Any, x_lab = "Subscription Rate", percent_x = T)
plot_hist(x_var = Education_LessThanHighSchoolGraduate, x_lab = "Less than High School Education", percent_x = T)
plot_hist(x_var = Education_HighSchoolGraduate, x_lab = "High School Education", percent_x = T)
plot_hist(x_var = Education_SomeCollegeAssociates, x_lab = "Some College or Associates Degree", percent_x = T)
plot_hist(x_var = Education_Bachelor.s, x_lab = "Bachelor's Degree or Higher", percent_x = T)
plot_hist(x_var = HHI, x_lab = "Household Income", dollar_x = T)
plot_hist(x_var = Per.Capita.Income, x_lab = "Per Capita Income", dollar_x = T)
plot_hist(x_var = Age, x_lab = "Age", comma_x = T)
plot_hist(x_var = RUCC_2013, x_lab = "RUCC", comma_x = T)
plot_hist(x_var = Funding_per_Capita, x_lab = "Funding per Capita", dollar_x = T)

# Subset the function and run for race
plm_data_race_sp <- plm_data_sort %>% subset(Year %in% c(2020, 2021)) 

# Feed this into the function
plot_hist(x_var = Race_White_percent, df_in = plm_data_race_sp, x_lab = "Percent White", percent_x = T)
plot_hist(x_var = Race_BlackAfricanAmerican_percent, df_in = plm_data_race_sp, x_lab = "Percent Black", percent_x = T)
plot_hist(x_var = Race_AmericanIndianAlaskanNative_percent, df_in = plm_data_race_sp, x_lab = "Percent Native American", percent_x = T)
plot_hist(x_var = Race_Asian_percent, df_in = plm_data_race_sp, x_lab = "Percent Asian", percent_x = T)
plot_hist(x_var = Race_NativeHawaiianOtherPacificIslander_percent, df_in = plm_data_race_sp, x_lab = "Percent Hawaiian or Pacific Islander", percent_x = T)
plot_hist(x_var = Race_Other_percent, df_in = plm_data_race_sp, x_lab = "Percent Other Race", percent_x = T)

# Group races
plm_data_race_sp$Race_Other_New_Grouping <-  plm_data_race_sp$Race_Asian_percent + plm_data_race_sp$Race_NativeHawaiianOtherPacificIslander_percent + plm_data_race_sp$Race_Other_percent
```

## Supplemental Correlation Analysis

```{r}
# Fix the dataset
plm_data_2021 <- plm_data_sort %>% subset(Year %in% c(2021))

# Group the races
plm_data_2021$Race_Other_New_Grouping <-  plm_data_2021$Race_Asian_percent + plm_data_2021$Race_NativeHawaiianOtherPacificIslander_percent + plm_data_2021$Race_Other_percent
```

## Table 1

```{r}
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Education_LessThanHighSchoolGraduate)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Education_HighSchoolGraduate)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Education_SomeCollegeAssociates)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Education_Bachelor.s)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$HHI)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Per.Capita.Income)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Race_White_percent)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Race_BlackAfricanAmerican_percent)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Race_AmericanIndianAlaskanNative_percent)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Race_Other_New_Grouping)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$Age)
cor.test(plm_data_2021$SubscriptionRate_Any, plm_data_2021$RUCC_2013)
```

# 2. Granger Causality Analysis

## Create Lagged Variables

```{r}
lag_sr <- c(NA, plm_data_sort$SubscriptionRate_Any[1:4])

# Create a vector of lagged subscription rates
for (i in 2:72) {
  # Extract the data
  new_lag <- c(NA, plm_data_sort$SubscriptionRate_Any[(5*(i-1)+1):(5*(i-1)+4)])
  # Append to the original data frame
  lag_sr <- c(lag_sr, new_lag)
}

# Append to the data
plm_data_sort$SubscriptionRate_Any_lag1 <- lag_sr
```

```{r}
# Create a vector of lagged funding
lag_fund <- c(NA, plm_data_sort$Funding[1:4])

# Create a vector of lagged subscription rates
for (i in 2:72) {
  # Extract the data
  new_lag <- c(NA, plm_data_sort$Funding[(5*(i-1)+1):(5*(i-1)+4)])
  # Append to the original data frame
  lag_fund <- c(lag_fund, new_lag)
}

# Append to the data
plm_data_sort$Funding_lag1 <- lag_fund
```

```{r}
# Create a vector of lagged funding
lag_funding_per_capita <- c(NA, plm_data_sort$Funding_per_Capita[1:4])

# Create a vector of lagged subscription rates
for (i in 2:72) {
  # Extract the data
  new_lag <- c(NA, plm_data_sort$Funding_per_Capita[(5*(i-1)+1):(5*(i-1)+4)])
  # Append to the original data frame
  lag_funding_per_capita <- c(lag_funding_per_capita, new_lag)
}

# Append to the data
plm_data_sort$Funding_per_Capita_lag1 <- lag_funding_per_capita
```

```{r}
# Cut down the data again
plm_data_smaller <- subset(plm_data_sort, Year > 2017)
```

```{r}
# Throw into a model
sub_model_ar1_pc <- lm(SubscriptionRate_Any ~ County + County*SubscriptionRate_Any_lag1, data = plm_data_smaller)
summary(sub_model_ar1_pc)
plot(sub_model_ar1_pc)
anova(sub_model_ar1_pc)
acf(sub_model_ar1_pc$residuals)

# Set up large model for Granger Causality
sub_model_ar1_wlf_pc <- lm(SubscriptionRate_Any ~ County + County*SubscriptionRate_Any_lag1 + County*Funding_per_Capita_lag1, data = plm_data_smaller)
anova(sub_model_ar1_wlf_pc)
summary(sub_model_ar1_wlf_pc)
acf(sub_model_ar1_wlf_pc$residuals)

# Test Granger Causality
sub_model_ar1_wlf_pc_red <- lm(SubscriptionRate_Any ~ County + County*SubscriptionRate_Any_lag1, data = plm_data_smaller)
anova(sub_model_ar1_wlf_pc, sub_model_ar1_wlf_pc_red)

## Funding per capita is not useful in predicting changes in the subscription rate

# Test reverse GC
funding_model_ar1_wlf_pc <- lm(Funding_per_Capita ~ County + County*Funding_per_Capita_lag1 + County*SubscriptionRate_Any_lag1, data = plm_data_smaller)
anova(funding_model_ar1_wlf_pc)
coef(funding_model_ar1_wlf_pc)
summary(funding_model_ar1_wlf_pc)
acf(funding_model_ar1_wlf_pc$residuals)
## Subscription rate is useful in explaining changes in funding

# Test Granger Causality
funding_model_ar1_wlf_pc_red <- lm(Funding_per_Capita ~ County + County*Funding_per_Capita_lag1, data = plm_data_smaller)
anova(funding_model_ar1_wlf_pc, funding_model_ar1_wlf_pc_red)
```

## Table 3

```{r}
# Granger Causality Summary

# Enter things row by row
r1 <- c("Does Lagged Per Capita Funding Help Predict Changes in Subscription Rates?", "", "")
r2 <- c("Variable", "Unadjusted P-Value", "Number of Lags Considered")
r3 <- c("Funding Per Capita", 0.6717, 1)

r4 <- c("Do Lagged Subscription Rates Help Predict Changes in the Funding Rate?", "", "")
r5 <- c("Variable", "Unadjusted P-Value", "Number of Lags Considered")
r6 <- c("Subscription Rate", "<0.001", 1)

# Combine into a table
granger_summary_table <- rbind(r1, r2, r3, r4, r5, r6) %>% as.data.frame()

# Create a nice summary table
granger_summary_table_out <- granger_summary_table %>% kable(row.names= F, col.names = NULL) %>% 
  kable_styling("striped", full_width = F) %>%
  row_spec(c(1,2,4,5),bold=T,hline_after = T) %>%
footnote(general = c("All variables are considered at the county level"))

# NOTE: Here is the granger summary table
granger_summary_table_out
```

Granger Causality results:

Is lagged per capita funding useful in predicting subscription rates in the presence of lagged subscription rate? No.

Is lagged subscription rate useful in predicting funding rates in the presence of lagged funding rates? Yes.

Thus, we can say that subscription rates Granger cause funding rates. 

## Supplemental Granger Causality Summary

```{r}
# Throw out county effect to get an idea of the FPC sign
funding_model_ar1_wlf_pc_nc_simp <- lm(Funding_per_Capita ~ County + Funding_per_Capita_lag1 + SubscriptionRate_Any_lag1, data = plm_data_smaller)
anova(funding_model_ar1_wlf_pc_nc_simp)
# coef(funding_model_ar1_wlf_pc_nc)
summary(funding_model_ar1_wlf_pc_nc_simp)
acf(funding_model_ar1_wlf_pc_nc_simp$residuals)

# This is the best "overall" summary
funding_model_ar1_wlf_pc_nc_simpler <- lm(Funding_per_Capita ~ SubscriptionRate_Any_lag1 + Funding_per_Capita_lag1, data = plm_data_smaller)
anova(funding_model_ar1_wlf_pc_nc_simpler)
# coef(funding_model_ar1_wlf_pc_nc)
summary(funding_model_ar1_wlf_pc_nc_simpler)
acf(funding_model_ar1_wlf_pc_nc_simpler$residuals)

# Compare the models
anova(funding_model_ar1_wlf_pc_nc_simp, funding_model_ar1_wlf_pc_nc_simpler)
```

## Table S4

```{r}
# Supplemental Granger Causality Summary
nc_sr_gc_summary <- summary(funding_model_ar1_wlf_pc_nc_simpler)

# Convert to a table
nc_sr_gc_table <- nc_sr_gc_summary$coefficients

# Update the row and column names
rownames(nc_sr_gc_table) <- c("Intercept",  "Lagged Subscription Rate", "Lagged Funding per Capita")
colnames(nc_sr_gc_table) <- c("Estimate", "Std. Error", "T Statistic", "P-Value")

# Round the data
nc_sr_gc_table <- nc_sr_gc_table %>% signif(digits = 4)

# Update the entry with the rounded p-value
nc_sr_gc_table[3,4] <- "<0.0001"

# Format the table nicely
nc_sr_gc_table_out <- nc_sr_gc_table %>% 
  # Round
  # signif(digits = 3) %>% 
  kable() %>% 
  kable_styling("striped", full_width = F) %>%
  # row_spec(2,bold=T,hline_after = T) %>%
footnote(general = c("Ignores county effects"))

nc_sr_gc_table_out
```

# 3. Drivers of Subscription Rates

## Table 4

```{r}
# Table of variables used

# Create vectors
`Key Predictors` <- c("Prior Year Subscription Rate ", "Prior Year Funding per Capita", "Year")
Geographic <- c("County", "", "")
Economic <- c("Per-Capita Income", "Household Income", "")
Demographic <- c("Population", "Age", "")
Education <- c("High School Graduate", "Some College/Associate's Degree", "Bachelor's Degree")

# Combine together
variables_df <- rbind(`Key Predictors`, Geographic, Economic, Demographic, Education) %>% as.data.frame()

# Make the table ready for export
variable_summary_table_out <- variables_df %>% kable(caption = "Variables Considered in the Full Model to Predict Subscription Rate", col.names = NULL) %>% kable_styling("striped", full_width = F) %>% kable_styling() %>% footnote(general = c("All variables are considered at the county level", "Year is treated as a factor", "Lagged Funding Rate and Lagged Subscription Rates are Interacted with County")) %>% column_spec(1, bold = T, border_right = T)  

# NOTE: Here is the variable summary table
variable_summary_table_out
```

```{r}
# Try a "kitchen-sink" model
sub_model_ar1_wlf_pc_allrhs <- lm(SubscriptionRate_Any ~ Year_fact + County + County*SubscriptionRate_Any_lag1 + County*Funding_per_Capita_lag1 +
Population + Education_HighSchoolGraduate+ 
  Education_SomeCollegeAssociates + Education_Bachelor.s +
  HHI_num + Age + Per.Capita.Income, data = plm_data_smaller)
anova(sub_model_ar1_wlf_pc_allrhs)
summary(sub_model_ar1_wlf_pc_allrhs)

sub_model_ar1_wlf_pc_allrhs$coefficients

# Try a BIC step function for a more parsimonious model
sub_model_ar1_wlf_pc_BIC <- stats::step(sub_model_ar1_wlf_pc_allrhs, k = log(nrow(plm_data_smaller)))
anova(sub_model_ar1_wlf_pc_BIC)
summary(sub_model_ar1_wlf_pc_BIC)
plot(sub_model_ar1_wlf_pc_BIC)
# Dramatically simpler model
```

## Table 5

```{r}
# Extract the coefficients
sub_model_ar1_wlf_pc_BIC_sum = summary(sub_model_ar1_wlf_pc_BIC)

# Save as a different table
BIC_model_coeffs <- sub_model_ar1_wlf_pc_BIC_sum$coefficients[,1:2]

# Convert first three columns back to numeric
BIC_model_coeffs <- as.data.frame(BIC_model_coeffs)
BIC_model_coeffs$Estimate <- BIC_model_coeffs$Estimate %>% as.numeric() %>% signif(digits = 3)
BIC_model_coeffs$`Std. Error` <- BIC_model_coeffs$`Std. Error` %>% as.numeric() %>% signif(digits = 3)
# BIC_model_coeffs$`t value` <- BIC_model_coeffs$`t value` %>% as.numeric()%>% signif(digits = 3)

# Add nice column names
rownames(BIC_model_coeffs) <- c("2018 Intercept", "Diff. between 2019 and 2018 Intercept", "Diff. between 2020 and 2018 Intercept", "Diff. between 2021 and 2018 Intercept", "Prior Year Subscription Rate")
colnames(BIC_model_coeffs) <- c("Estimate", "Std. Error")

# Report the results in a nice table
BIC_model_coeffs_summary_table_out <- BIC_model_coeffs %>% kable(caption = "Coefficients from the Reduced Model") %>% kable_styling("striped", full_width = F) %>% kable_styling()   

# NOTE: Here is the BIC model coefficients table
BIC_model_coeffs_summary_table_out
```

The above analysis takes the largest possible model and cuts it down using the Bayesian information criterion. While the p-value on the regression coefficient is somewhat hard to interpret since it comes after using the BIC, it does indicate that the lagged subscription rate is the most crucial variable by far to include in the model. 

## No lagged subscription rates analysis

```{r}
# Analysis on 4.13 - kick out lagged subscription rates
sub_model_ar1_wlf_pc_no_lsr <- lm(SubscriptionRate_Any ~ Year_fact + County + County*Funding_per_Capita_lag1 +
Population + Education_HighSchoolGraduate+ 
  Education_SomeCollegeAssociates + Education_Bachelor.s +
  HHI_num + Age + Per.Capita.Income, data = plm_data_smaller)
anova(sub_model_ar1_wlf_pc_no_lsr)
summary(sub_model_ar1_wlf_pc_no_lsr)

# Run through step function
sub_model_ar1_wlf_pc_BIC_no_lsr <- stats::step(sub_model_ar1_wlf_pc_no_lsr, k = log(nrow(plm_data_smaller)))
anova(sub_model_ar1_wlf_pc_BIC_no_lsr)
summary(sub_model_ar1_wlf_pc_BIC_no_lsr)
plot(sub_model_ar1_wlf_pc_BIC_no_lsr)
```

# 4. Assumption Checking

## Spatial Correlation

```{r}
# NOTE: You may need to update this file location

# Load in the list of counties
County_Region <- read.csv("Wisconsin Counties with Longitude CSV.csv")

# Join this
plm_data_joined <- merge(plm_data_smaller, County_Region, by = "County")
```

```{r}
# Convert year_fact to a factor
plm_data_joined$Year_fact <- as.factor(plm_data_joined$Year_fact)

# Add in the revised latitude and longitude
Wisconsin_lat_long_2020 <- USpopcenters::county2020 %>% filter(STNAME == "Wisconsin") %>% dplyr::select(COUNAME, LATITUDE, LONGITUDE)

# Rename and fix the county names
colnames(Wisconsin_lat_long_2020) <- c("COUNAME", "Latitude_2020", "Longitude_2020")

plm_data_joined_2020_ll <- merge(plm_data_joined, Wisconsin_lat_long_2020, by.x = "County", by.y = "COUNAME")
```


```{r}
# Initial work on the variogram
sub_model_ar1_wlf_pc_allrhs_var <- lm(SubscriptionRate_Any ~ Year_fact + County + County*SubscriptionRate_Any_lag1 + County*Funding_per_Capita_lag1 +
Population +Education_HighSchoolGraduate+ 
  Education_SomeCollegeAssociates + Education_Bachelor.s +
  HHI_num + Age + Per.Capita.Income, data = plm_data_joined_2020_ll)
anova(sub_model_ar1_wlf_pc_allrhs_var)
summary(sub_model_ar1_wlf_pc_allrhs_var)

# Form a new dataframe
plm_data_resid <- as.data.frame(cbind(plm_data_joined_2020_ll$Latitude_2020, plm_data_joined_2020_ll$Longitude_2020, sub_model_ar1_wlf_pc_allrhs_var$residuals, plm_data_joined_2020_ll$Year))
# Check that it changes
plm_data_resid_old <- as.data.frame(cbind(plm_data_joined_2020_ll$Latitude, plm_data_joined_2020_ll$Longitude, sub_model_ar1_wlf_pc_allrhs_var$residuals, plm_data_joined_2020_ll$Year))

# Rename variables
colnames(plm_data_resid) <- c("Latitude", "Longitude", "Residuals", "Year")

# Extract 4 different years
plm_data_resid_18 <- plm_data_resid %>% subset(Year == 2018)
  
#create geodata object from dat - this assumes the coordinates are in columns 1 and 2#
geodat_18 <- as.geodata(plm_data_resid_18, coord.col = 1:2, data.col = 3)

#create variogram and plot
vario_18 <- variog(geodat_18)
plot(vario_18)

#this website does a good job of explaining the basic procedure#
# http://leg.ufpr.br/geoR/geoRdoc/geoRintro.html

##mc envelope##
geodat_18_mc <- variog.mc.env(geodat_18, obj.variog = vario_18)

# Plot the data
plot(vario_18$u, vario_18$v, ylim = c(0, max(geodat_18_mc$v.upper)))
points(x = geodat_18_mc$u, y = geodat_18_mc$v.lower, col = "red")
points(x = geodat_18_mc$u, y = geodat_18_mc$v.upper, col = "red")
```

```{r}
#' A function to compute the variogram
#'
#' @param input_df - the input data frame, with latitude, then longitude, then residuals
#' @param year - a year for the title of the plot
#' @param model_desc - a written description of the model
#'
#' @return - a plot
#' @export
#'
#' @examples - # plot_variogram(input_df = plm_data_resid_18, "2018", "Full Model")
plot_variogram <- function(input_df, year, model_desc) {
  #create geodata object from dat - this assumes the coordinates are in columns 1 and 2#
  geodat <- as.geodata(input_df, coord.col = 1:2, data.col = 3)
  #create variogram and plot
  vario <- variog(geodat)
  ##mc envelope##
  geodat_mc <- variog.mc.env(geodat, obj.variog = vario)
  # Plot the data
  plot(vario$u, vario$v, ylim = c(0, max(geodat_mc$v.upper)), xlab = "Distance", ylab = "Semivariance", main = paste0("Variogram for the ", model_desc, " in ", year))
  points(x = geodat_mc$u, y = geodat_mc$v.lower, col = "red")
  points(x = geodat_mc$u, y = geodat_mc$v.upper, col = "red")
}
```

```{r}
# Extract 4 different years
plm_data_resid_18 <- plm_data_resid %>% subset(Year == 2018)
plm_data_resid_19 <- plm_data_resid %>% subset(Year == 2019)
plm_data_resid_20 <- plm_data_resid %>% subset(Year == 2020)
plm_data_resid_21 <- plm_data_resid %>% subset(Year == 2021)
```

## Figure S1: Variogram for Full Model

```{r}
par(mfrow = c(2,2))

# Plot the variogram for the full models
plot_variogram(input_df = plm_data_resid_18, "2018", "Full Model")
plot_variogram(input_df = plm_data_resid_19, "2019", "Full Model")
plot_variogram(input_df = plm_data_resid_20, "2020", "Full Model")
plot_variogram(input_df = plm_data_resid_21, "2021", "Full Model")
```

### General Model Diagnostics for Full Model

```{r}
# par(mfrow = c(2,2))
plot(sub_model_ar1_wlf_pc_allrhs_var)
summary(sub_model_ar1_wlf_pc_allrhs_var)
```

### Robustness of Standard Errors Test for Full Model

```{r}
vcov <- vcovHC(sub_model_ar1_wlf_pc_allrhs_var, type = "HC1")
# coeftest(sub_model_ar1_wlf_pc_allrhs_var, vcov. = vcov)
# summary(sub_model_ar1_wlf_pc_allrhs_var)
```

## Variogram for BIC Reduced Model

```{r}
# Look at the full model on the new dataset
sub_model_reduced <- lm(SubscriptionRate_Any ~ Year_fact + SubscriptionRate_Any_lag1, data = plm_data_joined_2020_ll)
anova(sub_model_reduced)
summary(sub_model_reduced)

# Form a new dataframe
plm_data_resid_red <- as.data.frame(cbind(plm_data_joined_2020_ll$Latitude_2020, plm_data_joined_2020_ll$Longitude_2020, sub_model_reduced$residuals, plm_data_joined_2020_ll$Year))

# Rename variables
colnames(plm_data_resid_red) <- c("Latitude", "Longitude", "Residuals", "Year")

# Extract 4 different years
plm_data_resid_red_18 <- plm_data_resid_red %>% subset(Year == 2018)
plm_data_resid_red_19 <- plm_data_resid_red %>% subset(Year == 2019)
plm_data_resid_red_20 <- plm_data_resid_red %>% subset(Year == 2020)
plm_data_resid_red_21 <- plm_data_resid_red %>% subset(Year == 2021)
```

## Figure S2: Variogram for Reduced Model

```{r}
par(mfrow = c(2,2))

# Plot the variogram for the full models
plot_variogram(input_df = plm_data_resid_red_18, "2018", "Reduced Mod.")
plot_variogram(input_df = plm_data_resid_red_19, "2019", "Reduced Mod.")
plot_variogram(input_df = plm_data_resid_red_20, "2020", "Reduced Mod.")
plot_variogram(input_df = plm_data_resid_red_21, "2021", "Reduced Mod.")
```

### General Model Diagnostics for BIC Reduced Model

```{r}
# par(mfrow = c(2,2))
plot(sub_model_reduced)
```

### Robustness of Standard Errors Test for BIC Reduced Model

```{r}
vcov <- vcovHC(sub_model_reduced, type = "HC1")
# coeftest(sub_model_reduced, vcov. = vcov)
# summary(sub_model_reduced)
```

## Variogram for Granger Causality Test of Subscription Rate Predicting Funding Rate

```{r}
# Test reverse GC
funding_model_sr_gt <- lm(Funding_per_Capita ~ County + County*Funding_per_Capita_lag1 + County*SubscriptionRate_Any_lag1, data = plm_data_joined_2020_ll)
summary(funding_model_sr_gt)

# Form a new dataframe
plm_data_resid_sr_gt <- as.data.frame(cbind(plm_data_joined_2020_ll$Latitude_2020, plm_data_joined_2020_ll$Longitude_2020, funding_model_sr_gt$residuals, plm_data_joined_2020_ll$Year))

# Rename variables
colnames(plm_data_resid_sr_gt) <- c("Latitude", "Longitude", "Residuals", "Year")

# Extract 4 different years
plm_data_resid_sr_gt_18 <- plm_data_resid_sr_gt %>% subset(Year == 2018)
plm_data_resid_sr_gt_19 <- plm_data_resid_sr_gt %>% subset(Year == 2019)
plm_data_resid_sr_gt_20 <- plm_data_resid_sr_gt %>% subset(Year == 2020)
plm_data_resid_sr_gt_21 <- plm_data_resid_sr_gt %>% subset(Year == 2021)
```

## Figure S3: Variogram for Subscription Rate Granger Causality Test

```{r}
par(mfrow = c(2,2))

# Plot the variogram for the full models
plot_variogram(input_df = plm_data_resid_sr_gt_18, "2018", "Sub. Rt. G.C.")
plot_variogram(input_df = plm_data_resid_sr_gt_19, "2019", "Sub. Rt. G.C.")
plot_variogram(input_df = plm_data_resid_sr_gt_20, "2020", "Sub. Rt. G.C.")
plot_variogram(input_df = plm_data_resid_sr_gt_21, "2021", "Sub. Rt. G.C.")

# Nothing much going on here.
```

### General Model Diagnostics for Granger Causality Test of Subscription Rate Predicting Funding Rate

```{r}
plot(funding_model_sr_gt)
```

Based on the above plots, there does not appear to be a clear problem with spatial correlation in the Granger models, the full model, or the reduced BIC model.

## Variogram for Granger Causality Test of Funding Rate Predicting Subscription Rate

```{r}
# Test Granger causality
sub_model_fr_gt <- lm(SubscriptionRate_Any ~ County + County*SubscriptionRate_Any_lag1 + County*Funding_per_Capita_lag1, data = plm_data_joined_2020_ll)
summary(sub_model_fr_gt)

# Form a new dataframe
plm_data_resid_fr_gt <- as.data.frame(cbind(plm_data_joined_2020_ll$Latitude_2020, plm_data_joined_2020_ll$Longitude_2020, sub_model_fr_gt$residuals, plm_data_joined_2020_ll$Year))

# Rename variables
colnames(plm_data_resid_fr_gt) <- c("Latitude", "Longitude", "Residuals", "Year")

# Extract 4 different years
plm_data_resid_fr_gt_18 <- plm_data_resid_fr_gt %>% subset(Year == 2018)
plm_data_resid_fr_gt_19 <- plm_data_resid_fr_gt %>% subset(Year == 2019)
plm_data_resid_fr_gt_20 <- plm_data_resid_fr_gt %>% subset(Year == 2020)
plm_data_resid_fr_gt_21 <- plm_data_resid_fr_gt %>% subset(Year == 2021)
```

## Figure S4: Variogram for Funding Rate Granger Causality Test

```{r}
par(mfrow = c(2,2))

# Plot the variogram for the full models
plot_variogram(input_df = plm_data_resid_fr_gt_18, "2018", "Fund. Rt. G.C.")
plot_variogram(input_df = plm_data_resid_fr_gt_19, "2019", "Fund. Rt. G.C.")
plot_variogram(input_df = plm_data_resid_fr_gt_20, "2020", "Fund. Rt. G.C.")
plot_variogram(input_df = plm_data_resid_fr_gt_21, "2021", "Fund. Rt. G.C.")

# Nothing much going on here.
```

### General Model Diagnostics for Granger Causality Test of Funding Rate Predicting Subscription Rate

```{r}
plot(sub_model_fr_gt)
```

## Robustness of Standard Errors Test for Granger Causality Test of Funding Rate Predicting Subscription Rate

```{r}
vcov <- vcovHC(sub_model_fr_gt, type = "HC1")
# coeftest(sub_model_fr_gt, vcov. = vcov)
# summary(sub_model_fr_gt)
# 
## Not much of a difference
```

## Variogram for Simplified Granger Causality Test

```{r}
# Test reverse GC
funding_model_sr_gt_simp <- lm(Funding_per_Capita ~ Funding_per_Capita_lag1 + SubscriptionRate_Any_lag1, data = plm_data_joined_2020_ll)
summary(funding_model_sr_gt_simp)

# Form a new dataframe
plm_data_resid_sr_gt_simp <- as.data.frame(cbind(plm_data_joined_2020_ll$Latitude_2020, plm_data_joined_2020_ll$Longitude_2020, funding_model_sr_gt_simp$residuals, plm_data_joined_2020_ll$Year))

# Rename variables
colnames(plm_data_resid_sr_gt_simp) <- c("Latitude", "Longitude", "Residuals", "Year")

# Extract 4 different years
plm_data_resid_sr_gt_18_simp <- plm_data_resid_sr_gt_simp %>% subset(Year == 2018)
plm_data_resid_sr_gt_19_simp <- plm_data_resid_sr_gt_simp %>% subset(Year == 2019)
plm_data_resid_sr_gt_20_simp <- plm_data_resid_sr_gt_simp %>% subset(Year == 2020)
plm_data_resid_sr_gt_21_simp <- plm_data_resid_sr_gt_simp %>% subset(Year == 2021)
```

## Figure S5: Variogram for Simplified Granger Causality Test

```{r}
par(mfrow = c(2,2))

# Plot the variogram for the full models
plot_variogram(input_df = plm_data_resid_sr_gt_18_simp, "2018", "Simp. S.R. G.C.")
plot_variogram(input_df = plm_data_resid_sr_gt_19_simp, "2019", "Simp. S.R. G.C.")
plot_variogram(input_df = plm_data_resid_sr_gt_20_simp, "2020", "Simp. S.R. G.C.")
plot_variogram(input_df = plm_data_resid_sr_gt_21_simp, "2021", "Simp. S.R. G.C.")

# Nothing much going on here.
```

### General Model Diagnostics for Granger Causality Test of Subscription Rate Predicting Funding Rate

```{r}
plot(funding_model_sr_gt)
```

# 5. Supplemental Tables

## Large Model - Z-Scale

```{r}
# Create a new dataset
plm_data_rescale <- plm_data_smaller

# Write a function to rescale the variables
rescale_var <- function(var) {
  # Rescale the variable
  var_out <- (var - mean(var))/sd(var)
  # Return the variable
  return(var_out)
}

# Rescale the numeric variables
plm_data_rescale$SubscriptionRate_Any <- rescale_var(plm_data_rescale$SubscriptionRate_Any)
plm_data_rescale$SubscriptionRate_Any_lag1 <- rescale_var(plm_data_rescale$SubscriptionRate_Any_lag1)
plm_data_rescale$Funding_per_Capita_lag1 <- rescale_var(plm_data_rescale$Funding_per_Capita_lag1)
plm_data_rescale$Population <- rescale_var(plm_data_rescale$Population)
plm_data_rescale$Education_HighSchoolGraduate <- rescale_var(plm_data_rescale$Education_HighSchoolGraduate)
plm_data_rescale$Education_SomeCollegeAssociates <- rescale_var(plm_data_rescale$Education_SomeCollegeAssociates)
plm_data_rescale$Education_Bachelor.s <- rescale_var(plm_data_rescale$Education_Bachelor.s)
plm_data_rescale$HHI_num <- rescale_var(plm_data_rescale$HHI_num)
plm_data_rescale$Age <- rescale_var(plm_data_rescale$Age)
plm_data_rescale$Per.Capita.Income <- rescale_var(plm_data_rescale$Per.Capita.Income)

# Try a "kitchen-sink"
sub_model_ar1_wlf_pc_allrhs_rescale <- lm(SubscriptionRate_Any ~ County + County*SubscriptionRate_Any_lag1 + County*Funding_per_Capita_lag1 +
Population + Education_HighSchoolGraduate+ 
  Education_SomeCollegeAssociates + Education_Bachelor.s +
  HHI_num + Age + Per.Capita.Income, data = plm_data_rescale)
anova(sub_model_ar1_wlf_pc_allrhs_rescale)
summary(sub_model_ar1_wlf_pc_allrhs_rescale)

# Try a BIC step function for a more parsimonious model
sub_model_ar1_wlf_pc_BIC_rescale <- stats::step(sub_model_ar1_wlf_pc_allrhs_rescale, k = log(nrow(plm_data_smaller)))
anova(sub_model_ar1_wlf_pc_BIC_rescale)
summary(sub_model_ar1_wlf_pc_BIC_rescale)
plot(sub_model_ar1_wlf_pc_BIC_rescale)
```

## Table S2: No County-Level Interactions

```{r}
# Model with no county
sub_model_ar1_wlf_pc_allrhs_no_county_int <- lm(SubscriptionRate_Any ~ SubscriptionRate_Any_lag1 + Funding_per_Capita_lag1 +
Population +  Age +  Education_HighSchoolGraduate+ 
  Education_SomeCollegeAssociates + Education_Bachelor.s +
  HHI_num +  Per.Capita.Income + Year_fact + County, data = plm_data_smaller)
anova(sub_model_ar1_wlf_pc_allrhs_no_county_int)
summary(sub_model_ar1_wlf_pc_allrhs_no_county_int)

# Make a nice table
no_int_summary <- summary(sub_model_ar1_wlf_pc_allrhs_no_county_int)
no_int_summary <- no_int_summary$coefficients %>% as.data.frame()

# Format the row names
no_int_summary_names <- rownames(no_int_summary)

# Trim county from the beginning and put at the end
no_int_summary_names <- gsub('County', '', no_int_summary_names)

no_int_summary_names[14:84] <- paste0(no_int_summary_names[14:84], " County")

# Give nice names to the other rows
no_int_summary_names[1:13] <- c("Intercept", "Prior Year Subscription Rate", "Prior Year Funding Rate",
                                "Population", "Age",
                                "High School Graduate", "Some College/Associates Degree",
                                "Bachelor's Degree",
                                "Household Income", "Per Capita Income",
                                "Diff. between 2019 and 2018 Intercept",
                                "Diff. between 2020 and 2018 Intercept",
                                "Diff. between 2021 and 2018 Intercept")

# Replace the rownames
rownames(no_int_summary) <- no_int_summary_names

# Replace the column names
colnames(no_int_summary) <- c("Estimate", "Std. Error", "T-value", "P-value")

# Format the data nicely
no_int_summary_table_out <- no_int_summary %>% signif(digits = 4) %>% kable(caption = "Coefficients from the Full Model without County Interactions", digits = 4) %>% kable_styling("striped", full_width = F) %>% kable_styling() %>%
footnote(general = c("Adams is the reference county, and 2018 is the reference year"))

# NOTE: Here is the summary table with no county interaction
no_int_summary_table_out
```

```{r}
# Compare BICs of two models
BIC(sub_model_ar1_wlf_pc_allrhs_no_county_int)
BIC(sub_model_ar1_wlf_pc_allrhs)

# Feed into stepwise model
sub_model_ar1_wlf_pc_BIC_no_county <- stats::step(sub_model_ar1_wlf_pc_allrhs_no_county_int, k = log(nrow(plm_data_smaller)))
anova(sub_model_ar1_wlf_pc_BIC_no_county)
summary(sub_model_ar1_wlf_pc_BIC_no_county)
plot(sub_model_ar1_wlf_pc_BIC_no_county)
```

## Table S3: No County-Level Interactions on z-scale

```{r}
# Model with no county
sub_model_ar1_wlf_pc_allrhs_no_county_int_rescale <- lm(SubscriptionRate_Any ~ SubscriptionRate_Any_lag1 + Funding_per_Capita_lag1 +
Population +  Age +  Education_HighSchoolGraduate+ 
  Education_SomeCollegeAssociates + Education_Bachelor.s +
  HHI_num +  Per.Capita.Income + Year_fact + County, data = plm_data_rescale)
anova(sub_model_ar1_wlf_pc_allrhs_no_county_int_rescale)
summary(sub_model_ar1_wlf_pc_allrhs_no_county_int_rescale)

# Make a nice table
no_int_summary_rs <- summary(sub_model_ar1_wlf_pc_allrhs_no_county_int_rescale)
no_int_summary_rs <- no_int_summary_rs$coefficients %>% as.data.frame()

# Format the row names
no_int_summary_names_rs <- rownames(no_int_summary_rs)

# Trim county from the beginning and put at the end
no_int_summary_names_rs <- gsub('County', '', no_int_summary_names_rs)
no_int_summary_names_rs[14:84] <- paste0(no_int_summary_names_rs[14:84], " County")

# Give nice names to the other rows
no_int_summary_names_rs[1:13] <- c("Intercept", "Prior Year Subscription Rate", "Prior Year Funding Rate",
                                "Population", "Age",
                                "High School Graduate", "Some College/Associates Degree",
                                "Bachelor's Degree",
                                "Household Income", "Per Capita Income",
                                "Diff. between 2019 and 2018 Intercept",
                                "Diff. between 2020 and 2018 Intercept",
                                "Diff. between 2021 and 2018 Intercept")

# Replace the rownames
rownames(no_int_summary_rs) <- no_int_summary_names_rs

# Replace the column names
colnames(no_int_summary_rs) <- c("Estimate", "Std. Error", "T-value", "P-value")

# Format the data nicely
no_int_summary_table_rescaled_out <- no_int_summary_rs %>% signif(digits = 4) %>% kable(caption = "Rescaled Coefficients from the Full Model without County Interactions", digits = 4) %>% kable_styling("striped", full_width = F) %>% kable_styling() %>%
footnote(general = c("Adams is the reference county, and 2018 is the reference year", "Rescaled to Z-Scale")) 

# NOTE: Here is the summary table with no county interaction with rescaled variables
no_int_summary_table_rescaled_out
```

```{r}
# If you see the message below, the code ran successfully.
print("The code ran successfully!")

# Record the system time
toc <- Sys.time()

# Measure the difference
time_to_run <-  toc - tic
print(time_to_run)
```